Skip to content

Conversation

@jonstacks
Copy link

What does this PR do?

Adds more changes to support Talos linux.

Motivation

I saw that some work had already been done in #1765. I tried it out, but couldn't get it working without these changes.

Additional Notes

This PR is just to merge changes into the existing PR for this work.

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

  1. Checked out the existing branch, built an image and pushed to GitHub container registry for testing.
VERSION=v1.14.0 IMG=ghcr.io/jonstacks/datadog-operator:v1.14.0-talos-patch make docker-build
docker image push ghcr.io/jonstacks/datadog-operator:v1.14.0-talos-patch
  1. Deployed with Argo into my Talos test cluster:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: datadog-operator
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  destination:
    namespace: datadog-operator
    server: https://kubernetes.default.svc
  source:
    chart: datadog-operator
    repoURL: https://helm.datadoghq.com
    targetRevision: 2.9.2
    helm:
      releaseName: datadog-operator
      valuesObject:
        introspection:
          enabled: true
        image:
          repository: ghcr.io/jonstacks/datadog-operator
          tag: v1.14.0-talos-patch
          pullPolicy: Always
          doNotCheckTag: true  # needed so that the introspection flag gets passed
        imagePullSecrets:
        - name: ghcr-io
        logLevel: "debug"
  project: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true
    - Validate=false
    retry:
      limit: 5
      backoff:
        duration: 5s
        maxDuration: 3m0s
        factor: 2
  1. Deployed the following DatadogAgent
apiVersion: "datadoghq.com/v2alpha1"
kind: "DatadogAgent"
metadata:
  name: "datadog"
  namespace: "datadog-operator"
spec:
  global:
    clusterName: "home-cluster"
    site: "us5.datadoghq.com"

    credentials:
      apiSecret:
        secretName: "datadog-secret"
        keyName: "api-key"

    kubelet:
      tlsVerify: false
    
    tags:
    - "env:dev"

  features:
    clusterChecks:
      enabled: true
      useClusterChecksRunners: true
    kubeStateMetricsCore:
      enabled: true
    logCollection:
      enabled: true
      containerCollectAll: false
    orchestratorExplorer:
      enabled: true

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label

@jonstacks jonstacks requested a review from a team as a code owner May 18, 2025 07:56
},
Status: corev1.NodeStatus{
NodeInfo: node.Status.NodeInfo,
},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, calling r.getNodeList was returning nodes with an empty status so we couldn't check for Talos linux. This copies the node's NodeInfo into the cache so when we list them later, NodeInfo is populated.

return nil
case TalosProvider:
// if only the Talos provider exists, there should be no affinity override.
return nil
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talos, by default, doesn't seem to apply a label to the nodes that we can use like Google COS does. If you have a cluster with both Talos Linux nodes and other nodes, I don't have a good solution for how to handle that right now.

@rothgar
Copy link

rothgar commented Jun 3, 2025

Does this also avoid mounting /etc/passwd like the other issue mentioned? I tried with the provider value for the helm chart and still got the error

failed to mkdir "/etc/passwd": mkdir /etc/passwd: read-only file system

@jonstacks
Copy link
Author

@rothgar, sorry for the delay. Yes, it does avoid mounting /etc/passwd. Currently you have to build a custom image from this branch and push it to a registry and pass that as a value to the helm chart like I'm doing with Argo up above.

Here are the generated volumeMounts on the datadog-agent-talos statefulset:

        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log/datadog
          name: logdatadog
        - mountPath: /checks.d
          name: checksd
          readOnly: true
        - mountPath: /etc/datadog-agent/auth
          name: datadog-agent-auth
        - mountPath: /conf.d
          name: confd
          readOnly: true
        - mountPath: /etc/datadog-agent
          name: config
        - mountPath: /host/proc
          name: procdir
          readOnly: true
        - mountPath: /host/var/run
          name: runtimesocketdir
          readOnly: true
      restartPolicy: Always

I run Talos in my homelab, but my datadog free trial ran out 😆 and that was enough friction to stop dev on it currently. I'll see if they'll grant me an extension.

@dd-octo-sts
Copy link

dd-octo-sts bot commented Oct 12, 2025

This pull request has been automatically marked as stale because it has not had activity in the past 15 days.

It will be closed in 30 days if no further activity occurs. If this pull request is still relevant, adding a comment or pushing new commits will keep it open. Also, you can always reopen the pull request if you missed the window.

Thank you for your contributions!

@dd-octo-sts dd-octo-sts bot added the stale label Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants